-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
String.Replace overload to do multiple replacements with an array of tuples #29249
Comments
Note that you could also do this with a regular expression, calling |
Your example code iterates over the source string multiple times. That's almost certainly not going to happen - it's far more likely we iterate through the source exactly once, something along these lines:
...note that, unlike your example code, this also prevents replacements from being considered for subsequent matches (which can be the cause of subtle errors). The big question then is whether we want to check the match list for duplicate or subsequence matches? I'm kinda thinking "no", for performance reasons if nothing else. I'd class it as a programmer error, not an "expected" error condition, though, which makes me want to squawk about it. |
@Clockwork-Muse
|
As I said before, this can be a desired feature, such in this case: So, maybe we need a new overload:
Overlapping can be considered False by default in the first overload. Note: |
Note: |
I am going to guess the scenario is something like data cleansing, and you almost never want "chained" replacements, and if you do, you can stick with chaining Assuming you do not, then The ideal implemention (assuming you start with a string) would be to iterate over it (probably using It would seem not unreasonable to me to add that to However if you have the data in a large string, and want to obtain a large string, you may not be particularly concerned with efficiency in the first place. If you were, you would be manipulating buffers already. And if your strings are not so large, the allocations are not so large either. @VBAndCs how large are your buffers and why are you using strings? |
@danmosemsft |
By the way, XElement behaves bad with xml nodes that contains literal strings beside xml elements, as it deletes all white spaces including line separators in this case, regardless of the reader ane writer settings! Dim cshtml = <cshtml>
<p>Test</p>
@foreach (var i in items)
{
<h>item: </h>
<div>
<p>i</p>
</div>
}
<p>End Test</p>
</cshtml>
Dim S = cshtml.ToString() This is the resulting string : <cshtml>
<p>Test</p>
@foreach (var i in items)
{
<h>item: </h><div><p>i</p></div>
}
<p>End Test</p></cshtml> The format is damaged because the c# string!. If you omitted it, every thing will work fine! Why? |
If you believe there is an issue in XElement, please open a separate issue.
Not necessarily, particularly when that change impacts everyone (even if they don't use it - binaries get larger, there is more to maintain, etc). The question is whether there are common cases where this proposed API is worth it. I think there probably is but it's not certainly true.
That would need API review, you can open a separate proposal (see format). Again one of the questions will be: is this common enough and useful enough to be worth the cost of implementation and maintenance, binary size, documentation, and potential user confusion (which should I use?). In this case I think probably not as it's trivial to replace with empty string. In the original case for this issue, the best argument is that we could do it more efficiently than the user can easily do. |
@danmosemsft public static string Replace(string s, params (string findStr, string repWith)[] replacements)
{
if (string.IsNullOrEmpty(s) || replacements == null || replacements.Length == 0)
return s;
var newStr = new MyLinkedList<char>(s);
int strLenght = s.Length;
foreach (var rep in replacements)
{
int start = 0;
while (start < strLenght)
{
var L = rep.findStr.Length;
if (L + start > strLenght)
break;
int pos = 0;
for (; pos < L; pos++)
{
if (rep.findStr[pos] != newStr[start + pos])
break;
}
if (pos == L) // A match found
{
newStr.Replace(start, L, rep.repWith);
strLenght = newStr.Count;
start += rep.repWith.Length -1;
}
start += 1;
}
}
return new string(newStr.ToArray());
} using System;
using System.Collections.Generic;
class MyLinkedList<ListType> where ListType : IComparable
{
public class Cell
{
public ListType Value;
public Cell NextCell;
public Cell()
{
}
public Cell(ListType Value, Cell NextCell)
{
this.Value = Value;
this.NextCell = NextCell;
}
}
public MyLinkedList()
{
}
public MyLinkedList(IEnumerable<ListType> items)
{
Add(items);
}
private Cell FirstCell, LastCell;
public Cell Add(ListType Item)
{
Cell C = new Cell() { Value = Item };
if (pCount == 0)
// يجب إنشاء نسخة جديدة من خانة البداية
FirstCell = C;
else
LastCell.NextCell = C;
LastCell = C;
pCount += 1;
return C;
}
private int pCount = 0;
public int Count => pCount;
private int lastReachedCellIndex = -1;
private Cell lastReachedCell = null;
private Cell GetCell(int Index)
{
try
{
if (Index < 0 || Index >= pCount)
throw new Exception("رقم العنصر غير صالح");
Cell c; int i;
var step = Index - lastReachedCellIndex;
if (lastReachedCell != null && step>-1 && step < Index)
{
c = lastReachedCell;
i = lastReachedCellIndex;
}
else
{
c = FirstCell;
i = 0;
}
while (i != Index)
{
c = c.NextCell;
i += 1;
}
lastReachedCell = c;
lastReachedCellIndex = i;
return c;
}
finally
{
}
}
public ListType this[int Index]
{
get
{
try
{
return GetCell(Index).Value;
}
finally { }
}
set
{
try
{
if (Index == pCount)
Add(value);
else
GetCell(Index).Value = value;
}
finally { }
}
}
public void Add(IEnumerable <ListType> Items)
{
foreach (var item in Items)
Add(item);
}
public void Add(MyLinkedList<ListType> NewList)
{
if (pCount == 0)
{
FirstCell = NewList.FirstCell;
LastCell = NewList.LastCell;
}
else
{
LastCell.NextCell = NewList.FirstCell;
LastCell = NewList.LastCell;
}
pCount += NewList.pCount;
}
public void CopyList(MyLinkedList<ListType> NewList)
{
Cell C = NewList.FirstCell;
while (C != null)
{
this.Add(C.Value);
C = C.NextCell;
}
}
public MyLinkedList<ListType> SubList(int St)
{
try
{
var L = new MyLinkedList<ListType>();
L.FirstCell = GetCell(St);
L.LastCell = LastCell;
L.pCount = pCount - St;
return L;
}
finally { }
}
public void Insert(ListType Item, int Index)
{
try
{
if (Index == pCount)
{
lastReachedCell = Add(Item);
lastReachedCellIndex = Index;
}
else
{
Cell C = new Cell(Item, GetCell(Index));
if (Index == 0)
FirstCell = C;
else
GetCell(Index - 1).NextCell = C;
pCount += 1;
lastReachedCell = C;
lastReachedCellIndex = Index;
}
}
finally { }
}
public void InsertList(MyLinkedList<ListType> NewList, int Index)
{
try
{
if (Index == pCount)
Add(NewList);
else
{
if (Index == 0)
FirstCell = NewList.FirstCell;
else
GetCell(Index - 1).NextCell = NewList.FirstCell;
}
NewList.LastCell = GetCell(Index);
pCount += NewList.Count;
}
finally
{
lastReachedCell = null;
lastReachedCellIndex = -1;
}
}
public void CopyList(MyLinkedList<ListType> NewList, int Index)
{
Cell C = NewList.FirstCell;
while (C != null)
{
Insert(C.Value, Index);
C = C.NextCell;
}
}
public void InsertArray(ListType[] Items, int Index)
{
var L = new MyLinkedList<ListType>();
L.Add(Items);
InsertList(L, Index);
}
public void Clear()
{
FirstCell = null;
LastCell = null;
pCount = 0;
lastReachedCell = null;
lastReachedCellIndex = -1;
}
public void Replace(int atStart, int length, IEnumerable<ListType> items )
{
var c = GetCell(atStart);
var i = 0;
foreach (var item in items)
{
if (i < length)
{
c.Value = item;
c = c.NextCell;
}
else
Insert(item, atStart + i);
i++;
}
if (i < length)
RemoveRange(atStart+ i, length - i );
}
public void RemoveAt(int Index)
{
try
{
if (Index == 0) // حذف أول عنصر
if (pCount == 1)
{
// العنصر الأول هو العنصر الأخير وسيتم حذفه
FirstCell = null;
LastCell = null;
}
else // جعل العنصر الثاني هو العنصر الأول
FirstCell = FirstCell.NextCell;
else if (Index == pCount - 1)
{
// حذف آخر عنصر بجعل العنصر السابق له هو العنصر الأخير
LastCell = GetCell(Index - 1);
LastCell.NextCell = null;
}
else
{//حذف العنصر الحالي، بجعل العنصر السابق يشير إلى العنصر التالي
var prev = GetCell(Index - 1); // العنصر السابق
prev.NextCell = prev.NextCell.NextCell;
}
pCount -= 1;
}
finally
{
lastReachedCell = null;
lastReachedCellIndex = -1;
}
}
public void RemoveLastItem()
{
try
{
if (pCount == 1)
{
FirstCell = null;
LastCell = null;
lastReachedCell = null;
lastReachedCellIndex = -1;
}
else
{
LastCell = GetCell(pCount - 2);
LastCell.NextCell = null;
}
pCount -= 1;
}
finally { }
}
public int RemoveRange(int St, int DelCount)
{
try
{
if (St + DelCount > pCount)
DelCount = pCount - St;
int nxtId = St + DelCount;
if (St == 0)
if (pCount == DelCount)
{
FirstCell = null;
LastCell = null;
}
else
FirstCell = GetCell(DelCount);
else if (nxtId == pCount)
{
LastCell = GetCell(St - 1);
LastCell.NextCell = null;
}
else
GetCell(St - 1).NextCell = GetCell(nxtId);
pCount -= DelCount;
return DelCount;
}
finally
{
lastReachedCell = null;
lastReachedCellIndex = -1;
}
}
private Cell pCurCell;
public ListType CurItem
{
get
{
try
{
if (pCurCell == null)
throw new Exception("لا توجد خانة حالية");
return pCurCell.Value;
}
finally { }
}
}
public bool MoveFirst()
{
if (FirstCell == null) return false;
pCurCell = FirstCell;
return true;
}
public bool MoveNext()
{
if (pCurCell == null)
throw new Exception("لا توجد خانة حالية");
if (pCurCell.NextCell == null) return false;
pCurCell = pCurCell.NextCell;
return true;
}
public bool MoveLast()
{
if (LastCell == null) return false;
pCurCell = LastCell;
return true;
}
public bool MoveTo(int Index)
{
try
{
pCurCell = GetCell(Index);
return true;
}
catch
{
return false;
}
}
public ListType FirstItem
{
get
{
if (FirstCell == null)
throw new Exception("القائمة فارغة");
return FirstCell.Value;
}
set
{
if (FirstCell == null)
throw new Exception("القائمة فارغة");
FirstCell.Value = value;
}
}
public ListType LastItem
{
get
{
if (LastCell == null)
throw new Exception("القائمة فارغة");
return LastCell.Value;
}
set
{
if (LastCell == null)
throw new Exception("القائمة فارغة");
LastCell.Value = value;
}
}
public ListType[] ToArray(int St = 0, int En = -1)
{
try
{
if (En == -1 || En >= pCount)
En = pCount - 1;
ListType[] A = new ListType[En - St + 1];
Cell C = GetCell(St);
for (int i = 0; i <= En - St; i++)
{
A[i] = C.Value;
C = C.NextCell;
}
return A;
}
finally { }
}
} |
@VBAndCs what is the "best" solution of course depends on the use case. Your approach looks like it would allocate a lot (a new reference type for every char) and I think would be quite slow in general. I am confident that a reasonable implementation could be created, there is no need to create one unless and until the API is approved. We have a formal process for that. You would want to edit your top post to follow the format and then discussion can happen. |
So, we are going back to square 1: Function Replace(s As String, ParamArray repPairs() As (repStr As String, repWithStr As String)) As String
For Each x In repPairs
s = s.Replace(x.repStr, x.repWithStr)
Next
Return s
End Function It just offers a new compact syntax for the Replace method, with no performance improvements! |
...I feel like what you really want to make for your code is a separate parser, then build (and write) the HTML DOM. Which would save you trying to do string replacements (which are not only slow, but can bite you in these kinds of situations, as you're discovering), among other things. Otherwise, you might have better (overall) results if you did a regex match, then iterated over those results (being careful to look for overlapping matches). |
@Clockwork-Muse |
@VBAndCs to make progress on this issue, we will need the top post edited to match the API review guidelines, as linked above. |
After this discussion and trials, String.Replace seems the best choice for now, so I am closing this. Thanks. |
...I mean, even if it doesn't work for your use case, it's probably something we should look at. |
Ok, But I made a fast test and it turned out that the successive calls to the String.Replace (which uses SB to build the result) is faster than using successive StringBuilder.Replace or LinkiedList appraoch! |
Due to lack of recent activity, this issue has been marked as a candidate for backlog cleanup. It will be closed if no further activity occurs within 14 more days. Any new comment (by anyone, not necessarily the author) will undo this process. This process is part of our issue cleanup automation. |
This issue will now be closed since it had been marked |
I wrote this code:
Using successive replacements creates many visions of the string, which is less efficient, so I wrote it using tuples like this:
Which is more compact and more efficient. It can be implemented with this extension method:
I suggest to add this Replace overload to the string class.
The text was updated successfully, but these errors were encountered: